29 research outputs found

    SEAN: Image Synthesis with Semantic Region-Adaptive Normalization

    Full text link
    We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quantitative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region.Comment: Accepted as a CVPR 2020 oral paper. The interactive demo is available at https://youtu.be/0Vbj9xFgoU

    Labels4Free: Unsupervised Segmentation using StyleGAN

    Get PDF
    We propose an unsupervised segmentation framework for StyleGAN generated objects. We build on two main observations. First, the features generated by StyleGAN hold valuable information that can be utilized towards training segmentation networks. Second, the foreground and background can often be treated to be largely independent and be swapped across images to produce plausible composited images. For our solution, we propose to augment the StyleGAN2 generator architecture with a segmentation branch and to split the generator into a foreground and background network. This enables us to generate soft segmentation masks for the foreground object in an unsupervised fashion. On multiple object classes, we report comparable results against state-of-the-art supervised segmentation networks, while against the best unsupervised segmentation approach we demonstrate a clear improvement, both in qualitative and quantitative metrics. Project Page: https:/rameenabdal.github.io/Labels4Free

    Fatigue Life Simulation and Analysis of Aluminum Alloy Sheet Self-piercing Riveting

    Get PDF
    The fatigue life prediction model of self-piecing riveting components of aluminum alloy is established and the effects of roughness and residual stress on fatigue life of self-piercing riveting components is analyzed by the model. Finite element software ABAQUS and fatigue analysis software FE-SAFE are used to study the effects of roughness and residual stress on the fatigue life of self-piecing riveting components through finite element simulation and mathematical statistics multivariate orthogonal regression experiment. The quantitative relations between fatigue life and three variables (roughness, residual stress and maximum stress) are fitted, and the variation trend of fatigue life with roughness and residual stress is obtained. The order of influence of roughness, residual stress, maximum stress and two interactions on fatigue life is as follows: residual stress, interaction between roughness and residual stress, roughness. When the maximum stress is fixed, the fatigue life decreases with the increase of roughness with a certain residual stress, and the fatigue life decreases with the increase of roughness with a certain residual stress. The average error between the fatigue experiment results and the simulation results is 9.74%, which proves that the simulation results are reliable

    3DAvatarGAN: Bridging Domains for Personalized Editable Avatars

    Full text link
    Modern 3D-GANs synthesize geometry and texture by training on large-scale datasets with a consistent structure. Training such models on stylized, artistic data, with often unknown, highly variable geometry, and camera information has not yet been shown possible. Can we train a 3D GAN on such artistic data, while maintaining multi-view consistency and texture quality? To this end, we propose an adaptation framework, where the source domain is a pre-trained 3D-GAN, while the target domain is a 2D-GAN trained on artistic datasets. We then distill the knowledge from a 2D generator to the source 3D generator. To do that, we first propose an optimization-based method to align the distributions of camera parameters across domains. Second, we propose regularizations necessary to learn high-quality texture, while avoiding degenerate geometric solutions, such as flat shapes. Third, we show a deformation-based technique for modeling exaggerated geometry of artistic domains, enabling -- as a byproduct -- personalized geometric editing. Finally, we propose a novel inversion method for 3D-GANs linking the latent spaces of the source and the target domains. Our contributions -- for the first time -- allow for the generation, editing, and animation of personalized artistic 3D avatars on artistic datasets.Comment: Project Page: https://rameenabdal.github.io/3DAvatarGAN

    BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation

    Full text link
    We present a large-scale video subtitle translation dataset, BigVideo, to facilitate the study of multi-modality machine translation. Compared with the widely used How2 and VaTeX datasets, BigVideo is more than 10 times larger, consisting of 4.5 million sentence pairs and 9,981 hours of videos. We also introduce two deliberately designed test sets to verify the necessity of visual information: Ambiguous with the presence of ambiguous words, and Unambiguous in which the text context is self-contained for translation. To better model the common semantics shared across texts and videos, we introduce a contrastive learning method in the cross-modal encoder. Extensive experiments on the BigVideo show that: a) Visual information consistently improves the NMT model in terms of BLEU, BLEURT, and COMET on both Ambiguous and Unambiguous test sets. b) Visual information helps disambiguation, compared to the strong text baseline on terminology-targeted scores and human evaluation. Dataset and our implementations are available at https://github.com/DeepLearnXMU/BigVideo-VMT.Comment: Accepted to ACL 2023 Finding
    corecore